Name : Manda Sagar¶

Roll No : 6182¶

Class : SYIT B¶

Subject : Data Science¶

Topic : Fifa_19 players Analysis.¶

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
import plotly.express as px
import numpy as np
plt.rcParams["figure.figsize"]=[20,10]
d=pd.read_csv("C:\\Users\\Sagar\\Downloads\\fifa_eda_stats.csv")
In [21]:
d.head()
Out[21]:
Name Age Nationality Overall Potential Club Value Wage Preferred Foot International Reputation ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
0 L. Messi 31 Argentina 94 94 FC Barcelona €110.5M €565K Left 5.0 ... 59.0 94.0 48.0 22.0 94.0 94.0 75.0 96.0 28.0 26.0
1 Cristiano Ronaldo 33 Portugal 94 94 Juventus €77M €405K Right 5.0 ... 79.0 93.0 63.0 29.0 95.0 82.0 85.0 95.0 31.0 23.0
2 Neymar Jr 26 Brazil 92 93 Paris Saint-Germain €118.5M €290K Right 5.0 ... 49.0 82.0 56.0 36.0 89.0 87.0 81.0 94.0 24.0 33.0
3 De Gea 27 Spain 91 93 Manchester United €72M €260K Right 4.0 ... 64.0 12.0 38.0 30.0 12.0 68.0 40.0 68.0 21.0 13.0
4 K. De Bruyne 27 Belgium 91 92 Manchester City €102M €355K Right 4.0 ... 75.0 91.0 76.0 61.0 87.0 94.0 79.0 88.0 58.0 51.0

5 rows × 44 columns

In [22]:
d.tail()
Out[22]:
Name Age Nationality Overall Potential Club Value Wage Preferred Foot International Reputation ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
18202 J. Lundstram 19 England 47 65 Crewe Alexandra €60K €1K Right 1.0 ... 47.0 38.0 46.0 46.0 39.0 52.0 43.0 45.0 48.0 47.0
18203 N. Christoffersson 19 Sweden 47 63 Trelleborgs FF €60K €1K Right 1.0 ... 67.0 42.0 47.0 16.0 46.0 33.0 43.0 42.0 15.0 19.0
18204 B. Worman 16 England 47 67 Cambridge United €60K €1K Right 1.0 ... 32.0 45.0 32.0 15.0 48.0 43.0 55.0 41.0 13.0 11.0
18205 D. Walker-Rice 17 England 47 66 Tranmere Rovers €60K €1K Right 1.0 ... 48.0 34.0 33.0 22.0 44.0 47.0 50.0 46.0 25.0 27.0
18206 G. Nugent 16 England 46 66 Tranmere Rovers €60K €1K Right 1.0 ... 60.0 32.0 56.0 42.0 34.0 49.0 33.0 43.0 43.0 50.0

5 rows × 44 columns

In [23]:
d.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 17918 entries, 0 to 18206
Data columns (total 44 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Name                      17918 non-null  object 
 1   Age                       17918 non-null  int64  
 2   Nationality               17918 non-null  object 
 3   Overall                   17918 non-null  int64  
 4   Potential                 17918 non-null  int64  
 5   Club                      17918 non-null  object 
 6   Value                     17918 non-null  object 
 7   Wage                      17918 non-null  object 
 8   Preferred Foot            17918 non-null  object 
 9   International Reputation  17918 non-null  float64
 10  Weak Foot                 17918 non-null  float64
 11  Work Rate                 17918 non-null  object 
 12  Body Type                 17918 non-null  object 
 13  Position                  17918 non-null  object 
 14  Height                    17918 non-null  object 
 15  Weight                    17918 non-null  object 
 16  Crossing                  17918 non-null  float64
 17  Finishing                 17918 non-null  float64
 18  HeadingAccuracy           17918 non-null  float64
 19  ShortPassing              17918 non-null  float64
 20  Volleys                   17918 non-null  float64
 21  Dribbling                 17918 non-null  float64
 22  Curve                     17918 non-null  float64
 23  FKAccuracy                17918 non-null  float64
 24  LongPassing               17918 non-null  float64
 25  BallControl               17918 non-null  float64
 26  Acceleration              17918 non-null  float64
 27  SprintSpeed               17918 non-null  float64
 28  Agility                   17918 non-null  float64
 29  Reactions                 17918 non-null  float64
 30  Balance                   17918 non-null  float64
 31  ShotPower                 17918 non-null  float64
 32  Jumping                   17918 non-null  float64
 33  Stamina                   17918 non-null  float64
 34  Strength                  17918 non-null  float64
 35  LongShots                 17918 non-null  float64
 36  Aggression                17918 non-null  float64
 37  Interceptions             17918 non-null  float64
 38  Positioning               17918 non-null  float64
 39  Vision                    17918 non-null  float64
 40  Penalties                 17918 non-null  float64
 41  Composure                 17918 non-null  float64
 42  StandingTackle            17918 non-null  float64
 43  SlidingTackle             17918 non-null  float64
dtypes: float64(30), int64(3), object(11)
memory usage: 6.2+ MB
In [24]:
d.describe()
Out[24]:
Age Overall Potential International Reputation Weak Foot Crossing Finishing HeadingAccuracy ShortPassing Volleys ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
count 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 ... 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000 17918.000000
mean 25.105257 66.236801 71.329334 1.113908 2.947260 49.748856 45.581147 52.295290 58.713417 42.932135 ... 65.323697 47.130316 55.879060 46.690870 49.995758 53.448934 48.544480 58.655263 47.684005 45.643208
std 4.675372 6.929243 6.144098 0.395495 0.660106 18.354989 19.512533 17.367823 14.680340 17.688194 ... 12.552242 19.251517 17.354347 20.691841 19.521104 14.119193 15.691563 11.420965 21.647674 21.270735
min 16.000000 46.000000 48.000000 1.000000 1.000000 5.000000 2.000000 4.000000 7.000000 4.000000 ... 17.000000 3.000000 11.000000 3.000000 2.000000 10.000000 5.000000 3.000000 2.000000 3.000000
25% 21.000000 62.000000 67.000000 1.000000 3.000000 38.000000 30.000000 44.000000 54.000000 30.000000 ... 58.000000 33.000000 44.000000 26.000000 39.000000 44.000000 39.000000 51.000000 27.000000 24.000000
50% 25.000000 66.000000 71.000000 1.000000 3.000000 54.000000 49.000000 56.000000 62.000000 44.000000 ... 67.000000 51.000000 59.000000 52.000000 55.000000 55.000000 49.000000 60.000000 55.000000 52.000000
75% 28.000000 71.000000 75.000000 1.000000 3.000000 64.000000 62.000000 64.000000 68.000000 57.000000 ... 74.000000 62.000000 69.000000 64.000000 64.000000 64.000000 60.000000 67.000000 66.000000 64.000000
max 45.000000 94.000000 95.000000 5.000000 5.000000 93.000000 95.000000 94.000000 93.000000 90.000000 ... 97.000000 94.000000 95.000000 92.000000 95.000000 94.000000 92.000000 96.000000 93.000000 91.000000

8 rows × 33 columns

In [3]:
#Data Cleaning
d.isnull().sum()
Out[3]:
Name                          0
Age                           0
Nationality                   0
Overall                       0
Potential                     0
Club                        241
Value                         0
Wage                          0
Preferred Foot               48
International Reputation     48
Weak Foot                    48
Work Rate                    48
Body Type                    48
Position                     60
Height                       48
Weight                       48
Crossing                     48
Finishing                    48
HeadingAccuracy              48
ShortPassing                 48
Volleys                      48
Dribbling                    48
Curve                        48
FKAccuracy                   48
LongPassing                  48
BallControl                  48
Acceleration                 48
SprintSpeed                  48
Agility                      48
Reactions                    48
Balance                      48
ShotPower                    48
Jumping                      48
Stamina                      48
Strength                     48
LongShots                    48
Aggression                   48
Interceptions                48
Positioning                  48
Vision                       48
Penalties                    48
Composure                    48
StandingTackle               48
SlidingTackle                48
dtype: int64
In [2]:
d.isnull()
Out[2]:
Name Age Nationality Overall Potential Club Value Wage Preferred Foot International Reputation ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
0 False False False False False False False False False False ... False False False False False False False False False False
1 False False False False False False False False False False ... False False False False False False False False False False
2 False False False False False False False False False False ... False False False False False False False False False False
3 False False False False False False False False False False ... False False False False False False False False False False
4 False False False False False False False False False False ... False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18202 False False False False False False False False False False ... False False False False False False False False False False
18203 False False False False False False False False False False ... False False False False False False False False False False
18204 False False False False False False False False False False ... False False False False False False False False False False
18205 False False False False False False False False False False ... False False False False False False False False False False
18206 False False False False False False False False False False ... False False False False False False False False False False

18207 rows × 44 columns

In [2]:
#we are using inplace parameter to make it apply to the permanent version of our data frame
d.dropna(inplace=True)
In [3]:
d.isnull().sum()
Out[3]:
Name                        0
Age                         0
Nationality                 0
Overall                     0
Potential                   0
Club                        0
Value                       0
Wage                        0
Preferred Foot              0
International Reputation    0
Weak Foot                   0
Work Rate                   0
Body Type                   0
Position                    0
Height                      0
Weight                      0
Crossing                    0
Finishing                   0
HeadingAccuracy             0
ShortPassing                0
Volleys                     0
Dribbling                   0
Curve                       0
FKAccuracy                  0
LongPassing                 0
BallControl                 0
Acceleration                0
SprintSpeed                 0
Agility                     0
Reactions                   0
Balance                     0
ShotPower                   0
Jumping                     0
Stamina                     0
Strength                    0
LongShots                   0
Aggression                  0
Interceptions               0
Positioning                 0
Vision                      0
Penalties                   0
Composure                   0
StandingTackle              0
SlidingTackle               0
dtype: int64
In [4]:
d.shape
Out[4]:
(17918, 44)
In [36]:
#Assigning a variable to x(Age) to plot a univariate distribution along x-axis

sns.histplot(x=d["Age"],hue=d["Preferred Foot"],multiple="stack",palette="Spectral",edgecolor="k")
plt.title("Distribution of Age of the players Based on their Preferred foot",fontsize=18)
plt.xlabel("Age of players",fontsize=15)
plt.ylabel("Count",fontsize=15)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show()

Hence it can be concluded that count of players with age 20-26 is maximum and their preferred foot is right.

In [35]:
d["Age"].mean()
Out[35]:
25.122205745043114
In [41]:
d["Nationality"].value_counts().head(20).plot(kind="bar")
plt.title("Top 20 nations with maximum Players",fontsize=18)
plt.xlabel("Nations",fontsize=18)
plt.ylabel("Count",fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=15)
plt.show()
In [6]:
#data cleaning
fifa=d.copy()
In [7]:
def str2float(euros):
    if euros[-1]=="M":
        return float(euros[1:-1])*1000000
    elif euros[-1]=="K":
        return float(euros[1:-1])*1000
    else:
        return float(euros[1:])
    
fifa['Value']=fifa['Value'].apply(lambda x: str2float(x))
fifa['Wage']=fifa['Wage'].apply(lambda x: str2float(x))
In [8]:
fifa[["Name","Value","Wage"]]
Out[8]:
Name Value Wage
0 L. Messi 110500000.0 565000.0
1 Cristiano Ronaldo 77000000.0 405000.0
2 Neymar Jr 118500000.0 290000.0
3 De Gea 72000000.0 260000.0
4 K. De Bruyne 102000000.0 355000.0
... ... ... ...
18202 J. Lundstram 60000.0 1000.0
18203 N. Christoffersson 60000.0 1000.0
18204 B. Worman 60000.0 1000.0
18205 D. Walker-Rice 60000.0 1000.0
18206 G. Nugent 60000.0 1000.0

17918 rows × 3 columns

In [42]:
#Count of players based on their height
sns.countplot(x=d["Height"],edgecolor="k",palette="Blues")
sns.set_theme(style="darkgrid")
plt.xlabel("Height",fontsize=15,weight="bold")
plt.ylabel("Number of players",fontsize=15,weight="bold")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.title("Count of players based on their height",fontsize=18)
plt.show()

Players of height 6'0 foot have the highest count. Around 2700 have height of 6'0 foot¶

Comparing the best players

In [15]:
skills=[]
for i in d.columns:
    skills.append(i)
In [16]:
skills
Out[16]:
['Name',
 'Age',
 'Nationality',
 'Overall',
 'Potential',
 'Club',
 'Value',
 'Wage',
 'Preferred Foot',
 'International Reputation',
 'Weak Foot',
 'Work Rate',
 'Body Type',
 'Position',
 'Height',
 'Weight',
 'Crossing',
 'Finishing',
 'HeadingAccuracy',
 'ShortPassing',
 'Volleys',
 'Dribbling',
 'Curve',
 'FKAccuracy',
 'LongPassing',
 'BallControl',
 'Acceleration',
 'SprintSpeed',
 'Agility',
 'Reactions',
 'Balance',
 'ShotPower',
 'Jumping',
 'Stamina',
 'Strength',
 'LongShots',
 'Aggression',
 'Interceptions',
 'Positioning',
 'Vision',
 'Penalties',
 'Composure',
 'StandingTackle',
 'SlidingTackle']
In [17]:
skill=[ 'Crossing',
 'Finishing',
 'HeadingAccuracy',
 'ShortPassing',
 'Volleys',
 'Dribbling',
 'Curve',
 'FKAccuracy',
 'LongPassing',
 'BallControl',
 'Acceleration',
 'SprintSpeed',
 'Agility',
 'Reactions',
 'Balance',
 'ShotPower',
 'Jumping',
 'Stamina',
 'Strength',
 'LongShots',
 'Aggression',
 'Interceptions',
 'Positioning',
 'Vision',
 'Penalties',
 'Composure',
 'StandingTackle',
 'SlidingTackle']     

#Based on these skills we are comparing the best players(Messi nad ronaldo)
In [18]:
messi=d.loc[d["Name"]=="L. Messi"]
messi=pd.DataFrame(messi,columns=skill)

ronaldo=d.loc[d["Name"]=="Cristiano Ronaldo"]
ronaldo=pd.DataFrame(ronaldo,columns=skill)
In [19]:
messi
Out[19]:
Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
0 84.0 95.0 70.0 90.0 86.0 97.0 93.0 94.0 87.0 96.0 ... 59.0 94.0 48.0 22.0 94.0 94.0 75.0 96.0 28.0 26.0

1 rows × 28 columns

In [20]:
ronaldo
Out[20]:
Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
1 84.0 94.0 89.0 81.0 87.0 88.0 81.0 76.0 77.0 94.0 ... 79.0 93.0 63.0 29.0 95.0 82.0 85.0 95.0 31.0 23.0

1 rows × 28 columns

In [21]:
sns.pointplot(data=messi,color="blue")
sns.pointplot(data=ronaldo,color="red")
plt.xticks(rotation=90,fontsize=14)
plt.yticks(fontsize=14)
plt.title("Messi Vs Ronaldo",fontsize=20)
plt.xlabel("Skills",fontsize=20)
plt.ylabel("Skills Value",fontsize=20)
plt.grid()

Messi is winning since skill Values of messi are greater than that of ronaldo here.¶

Top 5 nations with overall best players

In [22]:
t_nations=d.groupby(['Nationality'])["Overall"].max().sort_values(ascending=False).head(5)
#we are grouping 2 columns nationality and overall and taking the max value and sorting the first 5 by descending order
In [23]:
t_nations
Out[23]:
Nationality
Argentina    94
Portugal     94
Brazil       92
Croatia      91
Uruguay      91
Name: Overall, dtype: int64

Top 5 clubs with overall best players

In [24]:
t_clubs=d.groupby(['Club'])["Overall"].max().sort_values(ascending=False).head(5)
In [25]:
t_clubs
Out[25]:
Club
Juventus               94
FC Barcelona           94
Paris Saint-Germain    92
Chelsea                91
Manchester United      91
Name: Overall, dtype: int64
In [27]:
# Age distribution of players in Countries

countries_name=("Argentina","Portugal","Brazil","Croatia","Uruguay")
c=d.loc[d["Nationality"].isin(countries_name) & d['Age']]
sns.boxenplot(x="Nationality",y="Age",data=c)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Countries",fontsize=15)
plt.ylabel("Age",fontsize=15)
plt.title("Age Distribution of players in Countries",fontsize=15)
plt.grid()

Argentina is the country which keeps the players from all age groups ranging from(18-40). And Most of the players from Argentina are in the age group of 22-32.

In [31]:
#Age distribution of Players in Clubs
sns.set_theme(style="darkgrid")
club_n=("Juventus","FC Barcelona","Paris Saint-Germain","Chelsea","Manchester United")
club=d.loc[d["Club"].isin(club_n) & d["Age"]]
sns.boxplot(x="Club",y="Age",data=club,dodge=True,palette="turbo",showmeans=True)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Clubs",fontsize=18)
plt.ylabel("Age",fontsize=15)
plt.title("Age Distribution of players in Clubs",fontsize=18)
plt.grid()

Here the TOP 5 Clubs are taken and see Juventus their young players are tooo less their maximum players age is above 25 years. And in Paris club young players can be seen on a large number.

In [9]:
#Nation wise Players count And Average potential        lambda is function
# we are grouping natinality and overall column it will consider all the rows of overall
#x will go to overall ratings colmn
best_avg_Overall=d.groupby("Nationality").apply(lambda x:np.average(x["Overall"])).reset_index(name="Overall Ratings")
best_avg_player=d.groupby("Nationality").apply(lambda x:x["Overall"].count()).reset_index(name="Player Counts")
best_avg_count=pd.merge(best_avg_Overall,best_avg_player,how="inner",left_on="Nationality",right_on="Nationality")
top=best_avg_count[best_avg_count["Player Counts"]>=200]
top.sort_values(by=["Overall Ratings","Player Counts"],ascending=False)
px.scatter(top,x="Overall Ratings",y="Player Counts",color="Player Counts",hover_data=["Nationality"])

Here the nationality England has the player count 1662 and overall rating of about 63.42

In [78]:
best_avg_count
Out[78]:
Nationality Overall Ratings Player Counts
0 Afghanistan 61.000000 4
1 Albania 65.925000 40
2 Algeria 70.633333 60
3 Andorra 62.000000 1
4 Angola 67.600000 15
... ... ... ...
159 Uzbekistan 67.500000 2
160 Venezuela 67.268657 67
161 Wales 64.139535 129
162 Zambia 65.222222 9
163 Zimbabwe 69.769231 13

164 rows × 3 columns

In [75]:
best_avg_player
Out[75]:
Nationality Player Counts
0 Afghanistan 4
1 Albania 40
2 Algeria 60
3 Andorra 1
4 Angola 15
... ... ...
159 Uzbekistan 2
160 Venezuela 67
161 Wales 129
162 Zambia 9
163 Zimbabwe 13

164 rows × 2 columns

In [97]:
sns.countplot(x=fifa["Value"],order=fifa.Value.value_counts().iloc[:5].index,palette="gist_heat")
Out[97]:
<AxesSubplot:xlabel='Value', ylabel='count'>
In [32]:
sns.boxplot(x=d["Age"],y=d["Potential"],hue=d["Preferred Foot"],palette="ocean")
plt.xlabel("AGE OF THE PLAYERS",fontsize=15,weight="bold")
plt.ylabel("POTENTIAL",fontsize=15,weight="bold")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.title("POTENTIAL OF THE PLAYERS WITH RESPECT TO THEIR AGE",fontsize=17,weight="bold")
plt.show()

Players between 20-32 who have the preferred foot as Left has the heighest potential then that of right foot players.

In [98]:
sns.scatterplot(x=fifa["Overall"],y=fifa["Value"],color="r",s=100,edgecolor="k",alpha=0.4)
sns.set_theme(style="darkgrid")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Overall Ratings",fontsize=15,weight="bold")
plt.ylabel("Value",fontsize=15,weight="bold")
plt.title("Relationship Between Overall Ratings and Value of the players",fontsize=20)
plt.show()

It can be concluded from the above graph that as the Overall ratings of the Players increaseAs Their values also increases.

In [97]:
sns.set_style("whitegrid")
sns.set_color_codes()
sns.kdeplot(data=d,x="Potential",hue="Preferred Foot",multiple="stack",palette="seismic",edgecolor="k")
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Potential",fontsize=15,weight="bold")
plt.title("KDE(Potential of the players based on thier Preferred Foot)",fontsize=18)
plt.show()

From the above visual,it can be observed that density of the potential of players is maximum at 70 . And players with preferred foot left has higher Potential than right ones.

In [43]:
sns.jointplot(data=fifa,x=fifa["Potential"],y=fifa["Age"],kind="reg",marker="+",marginal_ticks=True,
             marginal_kws=dict(bins=25,fill=False),color='m')
Out[43]:
<seaborn.axisgrid.JointGrid at 0x2da0622e190>

The Players within the age 22-30 have the highest Potential.

In [55]:
sns.stripplot(x=fifa["Work Rate"],y=fifa["Wage"],data=fifa,
             jitter=False,s=20,marker="D",linewidth=1,alpha=0.2,palette="seismic",edgecolor="k")
Out[55]:
<AxesSubplot:xlabel='Work Rate', ylabel='Wage'>
In [133]:
sns.pairplot(fifa[["Potential","Wage","Age","Value"]])
Out[133]:
<seaborn.axisgrid.PairGrid at 0x1e8766e6fa0>
In [69]:
plt.bar(list(fifa["Nationality"].value_counts()[0:5].keys()),list(fifa["Nationality"].value_counts()[0:5]),color="lightgreen",
        edgecolor="k")
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.show()
#key->only category

From the above graph we conclude that most of the players belong to ENGLAND more than 1600.

In [144]:
plt.bar(list(fifa["Name"])[0:5],list(fifa["Wage"])[0:5],color="lightblue",edgecolor="k")
#key->only category
Out[144]:
<BarContainer object of 5 artists>

L.Messi has the highest wage among all the players.

In [101]:
#weight vs driblling
plt.xlabel('Weight', fontsize=25)
plt.ylabel('Dribbling', fontsize=25)
plt.title('Weight vs Dribbling', fontsize = 25)
sns.barplot(x='Weight', y='Dribbling', data=d.sort_values('Weight'),palette="viridis",edgecolor="k")
plt.xticks(rotation=90,fontsize=16,weight="bold")
plt.yticks(weight="bold",fontsize=16)
plt.show()

From the above figure it can be concluded that as the weight goes on increasing the Dribbling skill of the players is decreasing and in very few players after 205lbs weight are good at dribbling.

In [14]:
sns.countplot(x = 'Work Rate', data = d, palette = 'hls',edgecolor="k")
plt.title('Different work rates of the Players Participating in the FIFA 2019', fontsize = 20)
plt.xlabel('Work rates associated with the players', fontsize = 16)
plt.ylabel('count of Players', fontsize = 16)
plt.show()

Players with work rate medium/medium has the maximum particition in FIFA 2019.

In [34]:
#Every Nations' Player and their overall scores
some_countries = ('England', 'Germany', 'Spain', 'Argentina', 'France', 'Brazil', 'Italy', 'Columbia') # defining a tuple consisting of country names
data_countries = d.loc[d['Nationality'].isin(some_countries) & d['Overall']] # extracting the overall data of the countries selected in the line above
data_countries.head()
Out[34]:
Name Age Nationality Overall Potential Club Value Wage Preferred Foot International Reputation ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
3 De Gea 27 Spain 91 93 Manchester United €72M €260K Right 4.0 ... 64.0 12.0 38.0 30.0 12.0 68.0 40.0 68.0 21.0 13.0
8 Sergio Ramos 32 Spain 91 91 Real Madrid €51M €380K Right 4.0 ... 83.0 59.0 88.0 90.0 60.0 63.0 75.0 82.0 92.0 91.0
14 N. Kanté 27 France 89 90 Chelsea €63M €225K Right 3.0 ... 76.0 69.0 90.0 92.0 71.0 79.0 54.0 85.0 91.0 85.0
15 P. Dybala 24 Argentina 89 94 Juventus €89M €205K Left 3.0 ... 65.0 88.0 48.0 32.0 84.0 87.0 86.0 84.0 20.0 20.0
16 H. Kane 24 England 89 91 Tottenham Hotspur €83.5M €205K Right 3.0 ... 84.0 85.0 76.0 35.0 93.0 80.0 90.0 89.0 36.0 38.0

5 rows × 44 columns

In [45]:
ax = sns.boxplot(x = data_countries['Nationality'], y = data_countries['Overall'], palette = 'spring') # creating a bargraph
ax.set_xlabel(xlabel = 'Countries', fontsize = 18)
ax.set_ylabel(ylabel = 'Overall Scores', fontsize = 18)
ax.set_title(label = 'Distribution of overall scores of players from different countries', fontsize = 20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show()

Brazil has the players with highest overall rating around more than 70 points

In [10]:
top_play=d[['Name','Overall',"Age",'Club']]
top_play.sort_values(by='Overall',ascending=False,inplace=True)
top_30_play=top_play[:100]
fig=px.scatter(top_30_play,x='Age',y='Overall',color='Age',size='Overall',hover_data=['Name','Club'],title='Top Football Players in the FIFA 19 game')
fig.show()
C:\Users\Sagar\AppData\Local\Temp\ipykernel_12284\1668211768.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

From the above graph we can conclude that the top Football Players from the FIFA 19 game are Messi and Ronaldo and their age lies between 30-35(based on Overall Ratings).

In [64]:
#Every Clubs' Player and their overall scores
some_clubs = ('CD Leganés', 'Southampton', 'RC Celta', 'Empoli', 'Fortuna Düsseldorf', 'Manchestar City',
             'Tottenham Hotspur', 'FC Barcelona', 'Valencia CF', 'Chelsea', 'Real Madrid') # creating a tuple of club names

data_clubs = d.loc[d['Club'].isin(some_clubs) & d['Overall']] # extracting the overall data of the clubs selected in the line above
data_clubs.head()
Out[64]:
Name Age Nationality Overall Potential Club Value Wage Preferred Foot International Reputation ... Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure StandingTackle SlidingTackle
5 E. Hazard 27 Belgium 91 91 Chelsea €93M €340K Right 4.0 ... 66.0 80.0 54.0 41.0 87.0 89.0 86.0 91.0 27.0 22.0
6 L. Modrić 32 Croatia 91 91 Real Madrid €67M €420K Right 4.0 ... 58.0 82.0 62.0 83.0 79.0 92.0 82.0 84.0 76.0 73.0
7 L. Suárez 31 Uruguay 91 91 FC Barcelona €80M €455K Right 5.0 ... 83.0 85.0 87.0 41.0 92.0 84.0 85.0 85.0 45.0 38.0
8 Sergio Ramos 32 Spain 91 91 Real Madrid €51M €380K Right 4.0 ... 83.0 59.0 88.0 90.0 60.0 63.0 75.0 82.0 92.0 91.0
14 N. Kanté 27 France 89 90 Chelsea €63M €225K Right 3.0 ... 76.0 69.0 90.0 92.0 71.0 79.0 54.0 85.0 91.0 85.0

5 rows × 44 columns

In [67]:
ax = sns.boxplot(x = data_clubs['Club'], y = data_clubs['Overall'], palette = 'inferno') # creating a boxplot
ax.set_xlabel(xlabel = 'Some Popular Clubs', fontsize = 15)
ax.set_ylabel(ylabel = 'Overall Ratings', fontsize = 15)
ax.set_title(label = 'Distribution of Overall Ratings in Different popular Clubs', fontsize = 20)
plt.xticks(rotation = 45,fontsize=16)
plt.yticks(fontsize=16)
plt.show()
#The club Real Madrid has highest Overall ratings than other Popular clubs.
In [41]:
sns.histplot(fifa["BallControl"],color="crimson",linewidth=2)
plt.xticks(fontsize=18)
plt.xlabel("Ball control of players")
plt.ylabel("Number of players")
plt.show()
In [26]:
ax=sns.heatmap(fifa.corr(),annot=True)
ax.set(xlabel=" ",ylabel=" ")
ax.xaxis.tick_top()
plt.xticks(rotation=90,fontsize=18)
plt.yticks(fontsize=15)
plt.show()
In [52]:
sns.histplot(fifa,x=fifa["International Reputation"],y=fifa["Overall"],bins=10,discrete=(True,False),
            log_scale=(False,True),cbar=True,edgecolor="k")
plt.xticks(fontsize=15)
plt.yticks(fontsize=18)
plt.xlabel("International Reputation",fontsize=18)
plt.ylabel("Overall Ratings",fontsize=18)
plt.show()
#Teams with Only one International reputation has the highest Overall Ratings.
In [31]:
fifa.hist(bins=50,color="c",figsize=(40,30),edgecolor="k")
plt.show()
In [47]:
sns.FacetGrid(fifa,hue="Position",height=4).map(plt.bar,"Preferred Foot","International Reputation").add_legend()
plt.show()
#International players with preferred foot as right has more opportunities than left foot ones.
In [55]:
sns.histplot(x="Position",data=d,hue="Position",palette="magma")
plt.xticks(fontsize=18,rotation=90)
plt.yticks(fontsize=18)
plt.show()
# players having the position of striker has the Maximum count than other Positions.
In [57]:
x1=fifa["Position"].value_counts().head(5)
print(x1)
ST    2130
GK    1992
CB    1754
CM    1377
LB    1305
Name: Position, dtype: int64
In [59]:
label=x1.index
explode=[0,0,0,0,0.2]
color=sns.color_palette("Pastel1")
plt.pie(x1,labels=label,data=fifa,autopct="%0.1f%%",explode=explode,colors=color,shadow=True,startangle=0,
       wedgeprops={"linewidth":1,"edgecolor":"k"})
plt.figure(figsize=(20,6))
plt.axis("equal")
plt.show()
In [12]:
x3=fifa["Work Rate"].value_counts().head(5)
print(x3)
Medium/ Medium    9685
High/ Medium      3131
Medium/ High      1660
High/ High        1007
Medium/ Low        840
Name: Work Rate, dtype: int64
In [38]:
l=x3.index
explode=[0,0,0.1,0,0]
color=sns.color_palette("Pastel2_r")
plt.pie(x3,labels=l,data=fifa,autopct="%1.3f%%",explode=explode,colors=color,shadow=True,startangle=0,
       wedgeprops={"linewidth":1,"edgecolor":"k"})
plt.figure(figsize=(20,6))
plt.axis("equal")
plt.show()
In [108]:
sns.stripplot(x="Preferred Foot",y="Overall",data=d,size=9,color="lightgreen",marker="^",
             edgecolor='k',alpha=0.9,linewidth=0.2)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel("Preferred Foot",fontsize=15,weight="bold")
plt.ylabel("Overall Rating",fontsize=15,weight="bold")
plt.title("Preferred Footwise Ratings of Players",fontsize=18)
plt.show()
#Players preferring the right foot has More Overall ratings than that of right one
In [110]:
sns.lineplot(x=fifa["Age"],y=fifa["Stamina"],hue=fifa["Preferred Foot"],palette="mako_r")
sns.set_theme(style="darkgrid")
plt.xlabel("Age",fontsize=15,weight="bold")
plt.ylabel("Stamina",fontsize=15,weight="bold")
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Realationship Between Age and Stamina",fontsize=18)
plt.show()

IN the above Figure,the relationship of Age and stamina of player was visualized and concluded that players at the age of 25-30 have the highest Stamina level(70),where players preferring the left foot has higher stamina then those preferring the right one.

In [62]:
sns.pointplot(x=d["Position"],y=d["Age"],errorbar=("pi",100),capsize=0.4,join=False,palette="rocket")
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel("Position",fontsize=15,weight="bold")
plt.ylabel("Age",fontsize=15,weight="bold")
plt.title("Position and Age of Players",fontsize=18)
plt.show()
In [47]:
sns.violinplot(x=d["Preferred Foot"],y=d["Jumping"],palette="cividis")
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel("Preferred Foot",fontsize=15,weight="bold")
plt.ylabel("Jumping",fontsize=15,weight="bold")
plt.title("Preferred Foot and Jumping of players",fontsize=18)
plt.show()

Here from the above graph we can conclude that Players Preferred with Right Foot have slightly good jump than left foot players.

In [61]:
sns.barplot(x="Body Type",y="Overall",data=d,palette="coolwarm",edgecolor="k")
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel("Body Type",fontsize=15,weight="bold")
plt.ylabel("Overall Ratings",fontsize=15,weight="bold")
plt.title("Body Type and Overall ratings of players",fontsize=18)
plt.show()
In [11]:
#Body type column
fifa["Body Type"].value_counts()
Out[11]:
Normal                 10436
Lean                    6351
Stocky                  1124
Messi                      1
C. Ronaldo                 1
Neymar                     1
Courtois                   1
PLAYER_BODY_TYPE_25        1
Shaqiri                    1
Akinfenwa                  1
Name: Body Type, dtype: int64
In [60]:
sns.barplot(x="Body Type",y="Overall",data=fifa,palette="cool",edgecolor="k")
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel("Body Type",fontsize=15,weight="bold")
plt.ylabel("Overall Ratings",fontsize=15,weight="bold")
plt.title("Body Type and Overall ratings of players",fontsize=18)
plt.show()

Conclusion:- • Players Preferred with Right Foot have slightly good jump than left foot players. • The relationship of Age and stamina of player was visualized and concluded that players at the age of 25-30 have the highest Stamina (70), where players preferring the left foot has higher stamina then those preferring the right one. • Players preferring the right foot has More Overall ratings than that of right one. • Players having the position of striker has the Maximum count than other Positions. • International players with preferred foot as right have more opportunities than left foot ones. • Teams with Only one International reputation has the highest Overall Ratings. • The club Real Madrid has highest Overall ratings than other Popular clubs. • the top Football Players from the FIFA 19 game are Messi and Ronaldo and their age lies between 30-35(based on Overall Ratings). • Brazil has the players with highest overall rating around more than 70 points. • Players with work rate medium/medium has the maximum participation in FIFA 2019. • As the weight goes on increasing the Dribbling skill of the players is decreasing and in very few players after 205lbs weight are good at dribbling. • L.Messi has the highest wage among all the players. • From the above graph we conclude that most of the players belong to ENGLAND more than 1600. • The Players within the age 22-30 have the highest Potential. • density of the potential of players is maximum at 70 .And players with preferred foot left has higher Potential than right ones. • As the Overall ratings of the Players increase . Their values also increases. • Players between 20-32 who have the preferred foot as Left has the highest potential then that of right foot players. • Here the nationality England has the player count 1662 and overall rating of about 63.42. • Here the TOP 5 Clubs are taken and see Juventus their young players are tooo less their maximum players age is above 25 years. And in Paris club young players can be seen on a large number. • Argentina is the country which keeps the players from all age groups ranging from (18-40).And Most of the players from Argentina are in the age group of 22-32. • Messi is winning since skill Values of messi are greater than that of ronaldo here. • Players of height 6'0 foot have the highest count. Around 2700 have height of 6'0 foot. • count of players with age 20-26 is maximum and their preferred foot is right.